This R markdown file was auto-generated by the iDEP website. It is assumed that users have analyzed their data with iDEP by clicking through all the tabs and have downloaded the related files to a folder.
iDEP Service Used: iDEP 0.91 http://ge-lab.org/idep/, originally by Steven Xijin.Ge@sdstate.edu
Ge SX, Son EW, Yao R: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018, 19(1):534. PMID:30567491
First we set up the working directory to where the files are saved.
setwd('~/Documents/HTML_R/GLDS37') # Needs to be changed
R packages and iDEP core Functions. Users can also download the iDEP_core_functions.R file. Many R packages needs to be installed first. This may take hours. Each of these packages took years to develop.So be a patient thief. Sometimes dependencies needs to be installed manually. If you are using an older version of R, and having trouble with package installation, try un-install the current version of R, delete all folders and files (C:/Program Files/R/R-3.4.3), and reinstall from scratch.
if(file.exists('iDEP_core_functions.R'))
source('iDEP_core_functions.R') else
source('https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/iDEP_core_functions.R')
We are using the downloaded gene expression file where gene IDs has been converted to Ensembl gene IDs. This is because the ID conversion database is too large to download. You can use your original file if your file uses Ensembl ID, or you do not want to use the pathway files available in iDEP (or it is not available).
inputFile <- 'GLDS37_Expression.csv' # Expression matrix
sampleInfoFile <- 'GLDS37_Metadata.csv' # Experiment design file
geneInfoFile <- 'Arabidopsis_thaliana__athaliana_eg_gene_GeneInfo.csv' #Gene symbols, location etc.
geneSetFile <- 'Arabidopsis_thaliana__athaliana_eg_gene.db' # pathway database in SQL; can be GMT format
STRING10_speciesFile <- 'https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/STRING10_species.csv'
Parameters for reading data
input_missingValue <- 'geneMedian' #Missing values imputation method
input_dataFileFormat <- 1 #1- read counts, 2 FKPM/RPKM or DNA microarray
input_minCounts <- 0.5 #Min counts
input_NminSamples <- 1 #Minimum number of samples
input_countsLogStart <- 4 #Pseudo count for log CPM
input_CountsTransform <- 1 #Methods for data transformation of counts. 1-EdgeR's logCPM 2-VST, 3-rlog
readData.out <- readData(inputFile)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
library(knitr) # install if needed. for showing tables with kable
kable( head(readData.out$data) ) # show the first few rows of data
| Col0_FLT_Rep1 | Col0_FLT_Rep2 | Col0_FLT_Rep3 | Col0_FLT_Rep4 | Col0_FLT_Rep5 | Col0_FLT_Rep6 | Col0_FLT_Rep7 | Col0_FLT_Rep8 | Col0_GC_Rep1 | Col0_GC_Rep2 | Col0_GC_Rep3 | Col0_GC_Rep4 | Col0_GC_Rep5 | Col0_GC_Rep6 | Col0_GC_Rep7 | Col0_GC_Rep8 | Cvi0_FLT_Rep1 | Cvi0_FLT_Rep2 | Cvi0_FLT_Rep3 | Cvi0_FLT_Rep4 | Cvi0_FLT_Rep5 | Cvi0_FLT_Rep6 | Cvi0_GC_Rep1 | Cvi0_GC_Rep2 | Cvi0_GC_Rep3 | Cvi0_GC_Rep4 | Cvi0_GC_Rep5 | Cvi0_GC_Rep6 | Ler0_FLT_Rep1 | Ler0_FLT_Rep2 | Ler0_FLT_Rep3 | Ler0_FLT_Rep4 | Ler0_FLT_Rep5 | Ler0_FLT_Rep6 | Ler0_GC_Rep1 | Ler0_GC_Rep2 | Ler0_GC_Rep3 | Ler0_GC_Rep4 | Ler0_GC_Rep5 | Ler0_GC_Rep6 | Ws2_FLT_Rep1 | Ws2_FLT_Rep2 | Ws2_FLT_Rep3 | Ws2_FLT_Rep4 | Ws2_FLT_Rep5 | Ws2_FLT_Rep6 | Ws2_FLT_Rep7 | Ws2_FLT_Rep8 | Ws2_GC_Rep1 | Ws2_GC_Rep2 | Ws2_GC_Rep3 | Ws2_GC_Rep4 | Ws2_GC_Rep5 | Ws2_GC_Rep6 | Ws2_GC_Rep7 | Ws2_GC_Rep8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AT2G41310 | 17.02215 | 17.01796 | 17.11739 | 16.48040 | 17.91059 | 17.26694 | 19.69218 | 19.20036 | 17.56076 | 17.72695 | 17.07714 | 18.37849 | 19.95574 | 19.38143 | 18.83440 | 19.12361 | 17.73175 | 17.44373 | 17.19451 | 19.12470 | 19.43718 | 19.58092 | 17.26674 | 17.25639 | 17.64822 | 19.63789 | 18.11876 | 20.17007 | 17.05219 | 17.65309 | 16.62785 | 20.37468 | 19.15037 | 18.03638 | 17.07419 | 18.36805 | 16.57124 | 19.70836 | 19.61946 | 20.43292 | 17.96129 | 19.66522 | 18.71354 | 19.00023 | 20.36251 | 21.61254 | 21.09230 | 17.93576 | 18.56609 | 18.10778 | 18.84677 | 18.34000 | 20.34275 | 21.00643 | 18.05459 | 18.47575 |
| ATCG00020 | 16.62437 | 17.17358 | 19.19315 | 17.01808 | 17.00478 | 18.04289 | 18.66012 | 18.34950 | 17.07015 | 18.45830 | 17.53183 | 17.54468 | 17.76015 | 18.99225 | 17.71870 | 18.08240 | 20.40519 | 21.10712 | 20.35049 | 20.54416 | 21.03219 | 20.10172 | 18.97992 | 17.80879 | 18.09481 | 19.14103 | 18.69038 | 18.63707 | 19.47328 | 18.93102 | 20.12098 | 19.48839 | 19.30928 | 19.93716 | 18.69104 | 17.14268 | 18.52542 | 19.24912 | 17.33441 | 18.56512 | 19.42824 | 18.91051 | 18.40429 | 18.42327 | 19.77389 | 18.61999 | 18.81388 | 18.75576 | 18.83601 | 16.97496 | 16.25988 | 18.60692 | 19.16008 | 17.90282 | 16.11539 | 17.61189 |
| ATCG00490 | 16.26692 | 16.68090 | 18.18697 | 17.15535 | 16.37882 | 17.29286 | 16.89119 | 17.94238 | 16.17645 | 17.48932 | 17.34631 | 17.44971 | 17.57745 | 17.66614 | 17.15341 | 17.90495 | 19.79300 | 20.45193 | 19.76313 | 19.77044 | 19.97784 | 19.27952 | 18.62011 | 17.41364 | 17.78610 | 18.62409 | 18.23437 | 18.10593 | 19.34302 | 19.30219 | 20.03934 | 19.20705 | 19.52952 | 19.63502 | 18.59304 | 17.10605 | 18.02425 | 19.69438 | 17.36523 | 18.63291 | 19.28034 | 18.37439 | 18.11924 | 18.45781 | 19.29917 | 18.23765 | 18.44432 | 18.19475 | 18.72367 | 16.56960 | 15.78959 | 17.92674 | 18.84758 | 17.51991 | 16.18827 | 17.89627 |
| ATCG00530 | 13.76395 | 14.01048 | 14.89293 | 15.23116 | 14.04617 | 13.85371 | 15.29187 | 12.60741 | 15.35153 | 13.92719 | 13.69533 | 15.25500 | 16.41620 | 13.83225 | 13.61397 | 16.25125 | 14.23795 | 14.81088 | 14.57108 | 15.10085 | 14.76077 | 13.92069 | 13.94028 | 13.73958 | 13.90483 | 13.68475 | 13.84517 | 13.85507 | 14.20240 | 14.24318 | 13.94380 | 15.47188 | 14.89890 | 14.54820 | 14.21203 | 15.39214 | 15.26377 | 14.20046 | 13.78132 | 16.23040 | 16.26147 | 17.29414 | 16.61576 | 16.69669 | 17.20347 | 19.35494 | 18.03140 | 19.94179 | 14.88256 | 16.18132 | 16.92704 | 15.15767 | 15.39250 | 15.08054 | 15.84510 | 14.21931 |
| ATCG00740 | 14.73210 | 14.94219 | 14.96504 | 16.04677 | 14.54676 | 14.17489 | 15.18165 | 13.11581 | 15.64299 | 13.96589 | 13.91204 | 15.53611 | 17.04230 | 13.79330 | 14.20803 | 16.25587 | 15.05854 | 14.88059 | 14.75081 | 14.86381 | 14.25578 | 13.70575 | 13.68758 | 13.93228 | 14.10991 | 13.79351 | 13.79747 | 14.30696 | 14.88986 | 15.22620 | 14.16816 | 15.53573 | 14.88442 | 14.69631 | 15.62069 | 16.03472 | 16.02725 | 14.50385 | 14.76714 | 16.70364 | 16.58074 | 18.22371 | 17.31301 | 17.44096 | 17.00944 | 18.87289 | 17.40664 | 19.25440 | 15.67269 | 16.61073 | 17.15054 | 15.90107 | 15.14634 | 15.87748 | 15.79090 | 15.44628 |
| ATCG00650 | 13.96212 | 13.97166 | 14.27354 | 14.49403 | 14.36924 | 13.62418 | 14.69982 | 12.46797 | 14.94863 | 12.81904 | 12.93448 | 14.71847 | 15.66234 | 12.58834 | 13.64333 | 15.52227 | 14.05529 | 13.79046 | 14.04351 | 13.94000 | 13.25782 | 12.82255 | 12.96584 | 13.25719 | 12.99900 | 13.09773 | 12.91592 | 13.77193 | 14.15583 | 14.28258 | 13.15073 | 14.61523 | 14.36325 | 13.67559 | 14.85099 | 15.08337 | 15.65327 | 13.61956 | 14.15712 | 16.15821 | 16.43927 | 17.93933 | 17.06040 | 16.89910 | 16.60920 | 18.72510 | 17.42999 | 18.99587 | 15.30840 | 16.78608 | 16.80205 | 15.32805 | 14.87381 | 15.78032 | 15.27499 | 13.68067 |
readSampleInfo.out <- readSampleInfo(sampleInfoFile)
kable( readSampleInfo.out )
| Gravity | Variety | |
|---|---|---|
| Col0_FLT_Rep1 | Microgravity | Col0 WT |
| Col0_FLT_Rep2 | Microgravity | Col0 WT |
| Col0_FLT_Rep3 | Microgravity | Col0 WT |
| Col0_FLT_Rep4 | Microgravity | Col0 WT |
| Col0_FLT_Rep5 | Microgravity | Col0 WT |
| Col0_FLT_Rep6 | Microgravity | Col0 WT |
| Col0_FLT_Rep7 | Microgravity | Col0 WT |
| Col0_FLT_Rep8 | Microgravity | Col0 WT |
| Col0_GC_Rep1 | Terrestrial | Col0 WT |
| Col0_GC_Rep2 | Terrestrial | Col0 WT |
| Col0_GC_Rep3 | Terrestrial | Col0 WT |
| Col0_GC_Rep4 | Terrestrial | Col0 WT |
| Col0_GC_Rep5 | Terrestrial | Col0 WT |
| Col0_GC_Rep6 | Terrestrial | Col0 WT |
| Col0_GC_Rep7 | Terrestrial | Col0 WT |
| Col0_GC_Rep8 | Terrestrial | Col0 WT |
| Cvi0_FLT_Rep1 | Microgravity | Cvi0 WT |
| Cvi0_FLT_Rep2 | Microgravity | Cvi0 WT |
| Cvi0_FLT_Rep3 | Microgravity | Cvi0 WT |
| Cvi0_FLT_Rep4 | Microgravity | Cvi0 WT |
| Cvi0_FLT_Rep5 | Microgravity | Cvi0 WT |
| Cvi0_FLT_Rep6 | Microgravity | Cvi0 WT |
| Cvi0_GC_Rep1 | Terrestrial | Cvi0 WT |
| Cvi0_GC_Rep2 | Terrestrial | Cvi0 WT |
| Cvi0_GC_Rep3 | Terrestrial | Cvi0 WT |
| Cvi0_GC_Rep4 | Terrestrial | Cvi0 WT |
| Cvi0_GC_Rep5 | Terrestrial | Cvi0 WT |
| Cvi0_GC_Rep6 | Terrestrial | Cvi0 WT |
| Ler0_FLT_Rep1 | Microgravity | Ler0 WT |
| Ler0_FLT_Rep2 | Microgravity | Ler0 WT |
| Ler0_FLT_Rep3 | Microgravity | Ler0 WT |
| Ler0_FLT_Rep4 | Microgravity | Ler0 WT |
| Ler0_FLT_Rep5 | Microgravity | Ler0 WT |
| Ler0_FLT_Rep6 | Microgravity | Ler0 WT |
| Ler0_GC_Rep1 | Terrestrial | Ler0 WT |
| Ler0_GC_Rep2 | Terrestrial | Ler0 WT |
| Ler0_GC_Rep3 | Terrestrial | Ler0 WT |
| Ler0_GC_Rep4 | Terrestrial | Ler0 WT |
| Ler0_GC_Rep5 | Terrestrial | Ler0 WT |
| Ler0_GC_Rep6 | Terrestrial | Ler0 WT |
| Ws2_FLT_Rep1 | Microgravity | WS2 WT |
| Ws2_FLT_Rep2 | Microgravity | WS2 WT |
| Ws2_FLT_Rep3 | Microgravity | WS2 WT |
| Ws2_FLT_Rep4 | Microgravity | WS2 WT |
| Ws2_FLT_Rep5 | Microgravity | WS2 WT |
| Ws2_FLT_Rep6 | Microgravity | WS2 WT |
| Ws2_FLT_Rep7 | Microgravity | WS2 WT |
| Ws2_FLT_Rep8 | Microgravity | WS2 WT |
| Ws2_GC_Rep1 | Terrestrial | WS2 WT |
| Ws2_GC_Rep2 | Terrestrial | WS2 WT |
| Ws2_GC_Rep3 | Terrestrial | WS2 WT |
| Ws2_GC_Rep4 | Terrestrial | WS2 WT |
| Ws2_GC_Rep5 | Terrestrial | WS2 WT |
| Ws2_GC_Rep6 | Terrestrial | WS2 WT |
| Ws2_GC_Rep7 | Terrestrial | WS2 WT |
| Ws2_GC_Rep8 | Terrestrial | WS2 WT |
input_selectOrg ="NEW"
input_selectGO <- 'GOBP' #Gene set category
input_noIDConversion = TRUE
allGeneInfo.out <- geneInfo(geneInfoFile)
converted.out = NULL
convertedData.out <- convertedData()
nGenesFilter()
## [1] "16156 genes in 56 samples. 16121 genes passed filter.\n Original gene IDs used."
convertedCounts.out <- convertedCounts() # converted counts, just for compatibility
# Read counts per library
parDefault = par()
par(mar=c(12,4,2,2))
# barplot of total read counts
x <- readData.out$rawCounts
groups = as.factor( detectGroups(colnames(x ) ) )
if(nlevels(groups)<=1 | nlevels(groups) >20 )
col1 = 'green' else
col1 = rainbow(nlevels(groups))[ groups ]
barplot( colSums(x)/1e6,
col=col1,las=3, main="Total read counts (millions)")
readCountsBias() # detecting bias in sequencing depth
## [1] 5.946657e-07
## [1] 0.0008267088
## [1] 0.003814545
## [1] "Warning! Sequencing depth bias detected. Total read counts are significantly different among sample groups (p= 5.95e-07 ) based on ANOVA. Total read counts seem to be correlated with factor Gravity (p= 8.27e-04 ). Total read counts seem to be correlated with factor Variety (p= 3.81e-03 ). "
# Box plot
x = readData.out$data
boxplot(x, las = 2, col=col1,
ylab='Transformed expression levels',
main='Distribution of transformed data')
#Density plot
par(parDefault)
## Warning in par(parDefault): graphical parameter "cin" cannot be set
## Warning in par(parDefault): graphical parameter "cra" cannot be set
## Warning in par(parDefault): graphical parameter "csi" cannot be set
## Warning in par(parDefault): graphical parameter "cxy" cannot be set
## Warning in par(parDefault): graphical parameter "din" cannot be set
## Warning in par(parDefault): graphical parameter "page" cannot be set
densityPlot()
# Scatter plot of the first two samples
plot(x[,1:2],xlab=colnames(x)[1],ylab=colnames(x)[2],
main='Scatter plot of first two samples')
####plot gene or gene family
input_selectOrg ="BestMatch"
input_geneSearch <- 'HOXA' #Gene ID for searching
genePlot()
## NULL
input_useSD <- 'FALSE' #Use standard deviation instead of standard error in error bar?
geneBarPlotError()
## NULL
# hierarchical clustering tree
x <- readData.out$data
maxGene <- apply(x,1,max)
# remove bottom 25% lowly expressed genes, which inflate the PPC
x <- x[which(maxGene > quantile(maxGene)[1] ) ,]
plot(as.dendrogram(hclust2( dist2(t(x)))), ylab="1 - Pearson C.C.", type = "rectangle")
#Correlation matrix
input_labelPCC <- TRUE #Show correlation coefficient?
correlationMatrix()
# Parameters for heatmap
input_nGenes <- 1000 #Top genes for heatmap
input_geneCentering <- TRUE #centering genes ?
input_sampleCentering <- FALSE #Center by sample?
input_geneNormalize <- FALSE #Normalize by gene?
input_sampleNormalize <- FALSE #Normalize by sample?
input_noSampleClustering <- FALSE #Use original sample order
input_heatmapCutoff <- 4 #Remove outliers beyond number of SDs
input_distFunctions <- 1 #which distant funciton to use
input_hclustFunctions <- 1 #Linkage type
input_heatColors1 <- 1 #Colors
input_selectFactorsHeatmap <- 'Gravity' #Sample coloring factors
png('heatmap.png', width = 10, height = 15, units = 'in', res = 300)
staticHeatmap()
dev.off()
## png
## 2
[heatmap] (heatmap.png)
heatmapPlotly() # interactive heatmap using Plotly
input_nGenesKNN <- 2000 #Number of genes fro k-Means
input_nClusters <- 9 #Number of clusters
maxGeneClustering = 12000
input_kmeansNormalization <- 'geneMean' #Normalization
input_KmeansReRun <- 0 #Random seed
distributionSD() #Distribution of standard deviations
KmeansNclusters() #Number of clusters
Kmeans.out = Kmeans() #Running K-means
KmeansHeatmap() #Heatmap for k-Means
#Read gene sets for enrichment analysis
sqlite <- dbDriver('SQLite')
input_selectGO3 <- 'GOBP' #Gene set category
input_minSetSize <- 15 #Min gene set size
input_maxSetSize <- 2000 #Max gene set size
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO3,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
# Alternatively, users can use their own GMT files by
#GeneSets.out <- readGMTRobust('somefile.GMT')
results <- KmeansGO() #Enrichment analysis for k-Means clusters
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| Cluster | adj.Pval | Genes | Pathways |
|---|---|---|---|
| A | 3.10e-09 | 32 | Response to abiotic stimulus |
| 2.95e-07 | 23 | Response to external stimulus | |
| 1.24e-05 | 19 | Organic acid metabolic process | |
| 1.24e-05 | 25 | Response to organic substance | |
| 1.24e-05 | 13 | Monocarboxylic acid metabolic process | |
| 1.24e-05 | 22 | Response to oxygen-containing compound | |
| 1.67e-05 | 14 | Response to osmotic stress | |
| 1.67e-05 | 22 | Response to hormone | |
| 1.98e-05 | 22 | Response to endogenous stimulus | |
| 2.54e-05 | 13 | Response to salt stress | |
| B | 2.45e-101 | 71 | Photosynthesis |
| 4.64e-54 | 39 | Photosynthesis, light reaction | |
| 1.25e-46 | 49 | Generation of precursor metabolites and energy | |
| 3.34e-34 | 22 | Photosynthetic electron transport chain | |
| 3.91e-30 | 29 | Electron transport chain | |
| 4.00e-30 | 60 | Organonitrogen compound biosynthetic process | |
| 5.11e-29 | 55 | Oxidation-reduction process | |
| 3.30e-27 | 62 | Response to abiotic stimulus | |
| 1.70e-25 | 17 | Protein-chromophore linkage | |
| 1.06e-21 | 35 | Response to light stimulus | |
| C | 3.64e-42 | 133 | Response to abiotic stimulus |
| 1.71e-38 | 123 | Response to organic substance | |
| 5.68e-35 | 108 | Response to endogenous stimulus | |
| 5.68e-35 | 107 | Response to hormone | |
| 6.32e-33 | 79 | Response to inorganic substance | |
| 2.63e-31 | 57 | Response to metal ion | |
| 9.08e-31 | 121 | Multicellular organism development | |
| 9.87e-28 | 47 | Response to cadmium ion | |
| 3.10e-25 | 77 | Regulation of biological quality | |
| 5.04e-24 | 93 | Organonitrogen compound biosynthetic process | |
| D | 3.23e-41 | 71 | Response to abiotic stimulus |
| 7.12e-37 | 60 | Response to oxygen-containing compound | |
| 5.23e-30 | 48 | Response to acid chemical | |
| 8.16e-27 | 30 | Response to water | |
| 1.12e-25 | 29 | Response to water deprivation | |
| 1.44e-25 | 40 | Response to inorganic substance | |
| 7.96e-25 | 24 | Cellular response to decreased oxygen levels | |
| 7.96e-25 | 24 | Cellular response to oxygen levels | |
| 7.96e-25 | 24 | Cellular response to hypoxia | |
| 8.84e-25 | 47 | Cellular response to chemical stimulus | |
| F | 8.57e-25 | 108 | Response to organic substance |
| 1.97e-24 | 92 | Cellular response to chemical stimulus | |
| 3.58e-24 | 94 | Response to oxygen-containing compound | |
| 5.09e-24 | 110 | Response to abiotic stimulus | |
| 6.52e-24 | 95 | Response to hormone | |
| 1.94e-23 | 95 | Response to endogenous stimulus | |
| 5.23e-21 | 99 | Cell communication | |
| 3.59e-20 | 79 | Response to external stimulus | |
| 9.24e-20 | 73 | Response to acid chemical | |
| 3.26e-19 | 89 | Signal transduction | |
| H | 1.24e-32 | 76 | Organonitrogen compound biosynthetic process |
| 3.98e-29 | 52 | Amide biosynthetic process | |
| 5.82e-29 | 55 | Cellular amide metabolic process | |
| 1.77e-27 | 50 | Peptide metabolic process | |
| 2.11e-27 | 48 | Translation | |
| 2.18e-27 | 48 | Peptide biosynthetic process | |
| 1.04e-17 | 38 | Drug metabolic process | |
| 1.52e-15 | 26 | Response to cadmium ion | |
| 9.81e-15 | 53 | Small molecule metabolic process | |
| 2.20e-14 | 28 | Response to metal ion | |
| I | 2.45e-05 | 5 | Water transport |
| 2.45e-05 | 5 | Fluid transport | |
| 1.97e-04 | 14 | Transmembrane transport |
input_seedTSNE <- 0 #Random seed for t-SNE
input_colorGenes <- TRUE #Color genes in t-SNE plot?
tSNEgenePlot() #Plot genes using t-SNE
input_selectFactors <- 'Sample_Name' #Factor coded by color
input_selectFactors2 <- 'Sample_Name' #Factor coded by shape
input_tsneSeed2 <- 0 #Random seed for t-SNE
#PCA, MDS and t-SNE plots
PCAplot()
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 8. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 16 rows containing missing values (geom_point).
MDSplot()
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 8. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 16 rows containing missing values (geom_point).
tSNEplot()
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 8. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 16 rows containing missing values (geom_point).
#Read gene sets for pathway analysis using PGSEA on principal components
input_selectGO6 <- 'GOBP'
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO6,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
PCApathway() # Run PGSEA analysis
## Warning: Package 'KEGG.db' is deprecated and will be removed from Bioconductor
## version 3.12
cat( PCA2factor() ) #The correlation between PCs with factors
##
## Correlation between Principal Components (PCs) with factors
## PC1 is correlated with Variety (p=7.62e-14).
## PC2 is correlated with Variety (p=9.29e-14).
## PC3 is correlated with Gravity (p=2.56e-05).
input_CountsDEGMethod <- 3 #DESeq2= 3,limma-voom=2,limma-trend=1
input_limmaPval <- 0.1 #FDR cutoff
input_limmaFC <- 2 #Fold-change cutoff
input_selectModelComprions <- 'Gravity: Microgravity vs. Terrestrial' #Selected comparisons
input_selectFactorsModel <- 'Gravity' #Selected comparisons
input_selectInteractions <- NULL #Selected comparisons
input_selectBlockFactorsModel <- NULL #Selected comparisons
factorReferenceLevels.out <- c('Gravity:Microgravity')
limma.out <- limma()
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
DEG.data.out <- DEG.data()
limma.out$comparisons
## [1] "Microgravity-Terrestrial"
input_selectComparisonsVenn <- 'Microgravity-Terrestrial' #Selected comparisons for Venn diagram
input_UpDownRegulated <- FALSE #Split up and down regulated genes
vennPlot() # Venn diagram
sigGeneStats() # number of DEGs as figure
sigGeneStatsTable() # number of DEGs as table
## Comparisons Up Down
## Microgravity-Terrestrial Microgravity-Terrestrial 287 254
input_selectContrast <- 'Microgravity-Terrestrial' #Selected comparisons
selectedHeatmap.data.out <- selectedHeatmap.data()
selectedHeatmap() # heatmap for DEGs in selected comparison
# Save gene lists and data into files
write.csv( selectedHeatmap.data()$genes, 'heatmap.data.csv')
write.csv(DEG.data(),'DEG.data.csv' )
write(AllGeneListsGMT() ,'AllGeneListsGMT.gmt')
input_selectGO2 <- 'GOBP' #Gene set category
geneListData.out <- geneListData()
volcanoPlot()
scatterPlot()
MAplot()
geneListGOTable.out <- geneListGOTable()
# Read pathway data again
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO2,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
input_removeRedudantSets <- TRUE #Remove highly redundant gene sets?
results <- geneListGO() #Enrichment analysis
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| Direction | adj.Pval | nGenes | Pathways |
|---|---|---|---|
| Down regulated | 1.5e-11 | 13 | Hydrogen peroxide catabolic process |
| 9.6e-11 | 13 | Antibiotic catabolic process | |
| 1.8e-10 | 13 | Hydrogen peroxide metabolic process | |
| 3.0e-10 | 16 | Drug catabolic process | |
| 5.3e-10 | 13 | Cofactor catabolic process | |
| 5.3e-10 | 39 | Oxidation-reduction process | |
| 1.0e-09 | 23 | Root system development | |
| 1.0e-09 | 23 | Root development | |
| 1.0e-09 | 14 | Cellular oxidant detoxification | |
| 2.3e-09 | 14 | Cellular detoxification | |
| Up regulated | 5.4e-40 | 43 | Photosynthesis |
| 7.9e-30 | 81 | Response to abiotic stimulus | |
| 1.5e-22 | 62 | Response to oxygen-containing compound | |
| 1.6e-22 | 40 | Response to temperature stimulus | |
| 2.0e-22 | 24 | Photosynthesis, light reaction | |
| 1.5e-19 | 45 | Response to inorganic substance | |
| 1.9e-18 | 54 | Oxidation-reduction process | |
| 4.2e-18 | 38 | Response to light stimulus | |
| 4.2e-18 | 32 | Generation of precursor metabolites and energy | |
| 5.4e-18 | 17 | Response to hydrogen peroxide |
STRING-db API access. We need to find the taxonomy id of your species, this used by STRING. First we try to guess the ID based on iDEP’s database. Users can also skip this step and assign NCBI taxonomy id directly by findTaxonomyID.out = 10090 # mouse 10090, human 9606 etc.
STRING10_species = read.csv(STRING10_speciesFile)
ix = grep('Arabidopsis thaliana', STRING10_species$official_name )
findTaxonomyID.out <- STRING10_species[ix,1] # find taxonomyID
findTaxonomyID.out
## [1] 3702
Enrichment analysis using STRING
STRINGdb_geneList.out <- STRINGdb_geneList() #convert gene lists
# print(STRINGdb_geneList.out)
input_STRINGdbGO <- 'Process' #'Process', 'Component', 'Function', 'KEGG', 'Pfam', 'InterPro'
results <- stringDB_GO_enrichmentData() # enrichment using STRING
## Warning in string_db$get_enrichment(ids, category = input_STRINGdbGO, methodMT =
## "fdr", : methodMT parameter is depecated. Only FDR correction is available.
## Warning in string_db$get_enrichment(ids, category = input_STRINGdbGO, methodMT =
## "fdr", : iea parameter is deprecated.
## [1] "Process"
## Warning in string_db$get_enrichment(ids, category = input_STRINGdbGO, methodMT =
## "fdr", : methodMT parameter is depecated. Only FDR correction is available.
## Warning in string_db$get_enrichment(ids, category = input_STRINGdbGO, methodMT =
## "fdr", : iea parameter is deprecated.
## [1] "Process"
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| “No significant enrichment found.” | adj.Pval |
|---|---|
| No significant enrichment found. | NULL |
PPI network retrieval and analysis
input_nGenesPPI <- 100 #Number of top genes for PPI retrieval and analysis
stringDB_network1(1) #Show PPI network
Generating interactive PPI
write(stringDB_network_link(), 'PPI_results.html') # write results to html file
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
browseURL('PPI_results.html') # open in browser
input_selectContrast1 <- 'Microgravity-Terrestrial' #select Comparison
#input_selectContrast1 = limma.out$comparisons[3] # manually set
input_selectGO <- 'GOBP' #Gene set category
#input_selectGO='custom' # if custom gmt file
input_minSetSize <- 15 #Min size for gene set
input_maxSetSize <- 2000 #Max size for gene set
# Read pathway data again
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
input_pathwayPvalCutoff <- 0.2 #FDR cutoff
input_nPathwayShow <- 30 #Top pathways to show
input_absoluteFold <- FALSE #Use absolute values of fold-change?
input_GenePvalCutoff <- 1 #FDR to remove genes
input_pathwayMethod = 1 # 1 GAGE
gagePathwayData.out <- gagePathwayData() # pathway analysis using GAGE
results <- gagePathwayData.out #Enrichment analysis for k-Means clusters
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| Direction | GAGE analysis: Microgravity vs Terrestrial | statistic | Genes | adj.Pval |
|---|---|---|---|---|
| Up | Photosynthesis | 13.939 | 223 | 8.0e-34 |
| Photosynthesis, light reaction | 11.8834 | 119 | 3.6e-23 | |
| Photosynthetic electron transport chain | 9.1037 | 46 | 5.9e-12 | |
| Plastid organization | 8.4502 | 256 | 9.7e-14 | |
| Photosynthesis, light harvesting | 7.5301 | 31 | 5.2e-08 | |
| Response to temperature stimulus | 7.3826 | 493 | 6.9e-11 | |
| Chloroplast organization | 7.213 | 197 | 4.5e-10 | |
| Photosynthesis, light harvesting in photosystem I | 7.1124 | 16 | 1.3e-05 | |
| Protein-chromophore linkage | 6.7788 | 39 | 2.0e-07 | |
| Response to high light intensity | 6.6976 | 68 | 1.2e-07 | |
| Generation of precursor metabolites and energy | 6.4229 | 391 | 3.2e-08 | |
| Response to light intensity | 6.3025 | 133 | 1.5e-07 | |
| Response to heat | 6.1163 | 188 | 2.4e-07 | |
| Photosystem II assembly | 5.9155 | 25 | 3.0e-05 | |
| Thylakoid membrane organization | 5.342 | 46 | 3.6e-05 | |
| Tetrapyrrole metabolic process | 5.2072 | 93 | 3.0e-05 | |
| Porphyrin-containing compound metabolic process | 5.1735 | 92 | 3.3e-05 | |
| Chlorophyll metabolic process | 5.122 | 81 | 4.3e-05 | |
| RNA modification | 5.0764 | 321 | 3.0e-05 | |
| Plastid membrane organization | 5.0191 | 49 | 1.0e-04 | |
| Tetrapyrrole biosynthetic process | 5.0078 | 70 | 7.8e-05 | |
| Chlorophyll biosynthetic process | 4.967 | 58 | 1.0e-04 | |
| Porphyrin-containing compound biosynthetic process | 4.9529 | 67 | 9.9e-05 | |
| Regulation of photosynthesis | 4.7432 | 37 | 3.9e-04 | |
| Response to cold | 4.7183 | 323 | 1.1e-04 | |
| Protein folding | 4.6828 | 168 | 1.7e-04 | |
| Protein targeting to chloroplast | 4.474 | 43 | 8.3e-04 | |
| Establishment of protein localization to chloroplast | 4.474 | 43 | 8.3e-04 | |
| Pigment biosynthetic process | 4.3211 | 130 | 7.7e-04 | |
| NcRNA metabolic process | 4.2248 | 425 | 8.3e-04 |
pathwayListData.out = pathwayListData()
enrichmentPlot(pathwayListData.out, 25 )
enrichmentNetwork(pathwayListData.out )
enrichmentNetworkPlotly(pathwayListData.out)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
input_pathwayMethod = 3 # 1 fgsea
fgseaPathwayData.out <- fgseaPathwayData() #Pathway analysis using fgsea
## Warning in fgsea(pathways = gmt, stats = fold, minSize = input_minSetSize, :
## You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To
## run fgseaMultilevel, you need to remove the nperm argument in the fgsea function
## call.
results <- fgseaPathwayData.out #Enrichment analysis for k-Means clusters
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| Direction | GSEA analysis: Microgravity vs Terrestrial | NES | Genes | adj.Pval |
|---|---|---|---|---|
| Up | Photosynthesis | 3.0362 | 223 | 3.6e-03 |
| Photosynthesis, light reaction | 2.893 | 119 | 3.6e-03 | |
| Response to high light intensity | 2.8105 | 68 | 3.6e-03 | |
| Photosynthetic electron transport chain | 2.6312 | 46 | 3.6e-03 | |
| Response to light intensity | 2.6175 | 133 | 3.6e-03 | |
| Response to hydrogen peroxide | 2.6142 | 64 | 3.6e-03 | |
| Response to heat | 2.5483 | 188 | 3.6e-03 | |
| Plastid organization | 2.5425 | 256 | 3.6e-03 | |
| Cellular response to heat | 2.5052 | 64 | 3.6e-03 | |
| Photosynthesis, light harvesting | 2.4697 | 31 | 3.6e-03 | |
| Chloroplast organization | 2.4508 | 197 | 3.6e-03 | |
| Protein-chromophore linkage | 2.4411 | 39 | 3.6e-03 | |
| Response to temperature stimulus | 2.3383 | 493 | 3.6e-03 | |
| Protein folding | 2.3377 | 168 | 3.6e-03 | |
| Generation of precursor metabolites and energy | 2.2887 | 391 | 3.6e-03 | |
| Porphyrin-containing compound metabolic process | 2.2857 | 92 | 3.6e-03 | |
| Thylakoid membrane organization | 2.2798 | 46 | 3.6e-03 | |
| Chlorophyll metabolic process | 2.2792 | 81 | 3.6e-03 | |
| Chaperone-mediated protein folding | 2.2768 | 60 | 3.6e-03 | |
| Tetrapyrrole metabolic process | 2.2759 | 93 | 3.6e-03 | |
| Heat acclimation | 2.2725 | 44 | 3.6e-03 | |
| Protein refolding | 2.261 | 46 | 3.6e-03 | |
| Photosystem II assembly | 2.2491 | 25 | 3.6e-03 | |
| Plastid membrane organization | 2.2394 | 49 | 3.6e-03 | |
| Regulation of photosynthesis | 2.2215 | 37 | 3.6e-03 | |
| Tetrapyrrole biosynthetic process | 2.2161 | 70 | 3.6e-03 | |
| de novo protein folding | 2.2143 | 53 | 3.6e-03 | |
| Protein peptidyl-prolyl isomerization | 2.2106 | 55 | 3.6e-03 | |
| Porphyrin-containing compound biosynthetic process | 2.2068 | 67 | 3.6e-03 | |
| Chlorophyll biosynthetic process | 2.1963 | 58 | 3.6e-03 |
pathwayListData.out = pathwayListData()
enrichmentPlot(pathwayListData.out, 25 )
enrichmentNetwork(pathwayListData.out )
enrichmentNetworkPlotly(pathwayListData.out)
PGSEAplot() # pathway analysis using PGSEA
##
## Computing P values using ANOVA
input_selectContrast2 <- 'Microgravity-Terrestrial' #select Comparison
#input_selectContrast2 = limma.out$comparisons[3] # manually set
input_limmaPvalViz <- 0.1 #FDR to filter genes
input_limmaFCViz <- 2 #FDR to filter genes
genomePlotly() # shows fold-changes on the genome
## Warning in eval(quote(list(...)), env): NAs introduced by coercion
## Warning in genomePlotly(): NAs introduced by coercion
input_nGenesBiclust <- 1000 #Top genes for biclustering
input_biclustMethod <- 'BCCC()' #Method: 'BCCC', 'QUBIC', 'runibic' ...
biclustering.out = biclustering() # run analysis
input_selectBicluster <- 1 #select a cluster
biclustHeatmap() # heatmap for selected cluster
input_selectGO4 <- 'GOBP' #Gene set category
# Read pathway data again
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO4,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
results <- geneListBclustGO() #Enrichment analysis for k-Means clusters
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| adj.Pval | Genes | Pathways |
|---|---|---|
| 1.2e-115 | 287 | Response to abiotic stimulus |
| 3.2e-79 | 163 | Response to inorganic substance |
| 9.7e-75 | 227 | Response to organic substance |
| 1.7e-64 | 195 | Response to endogenous stimulus |
| 2.4e-64 | 193 | Response to hormone |
| 2.7e-57 | 180 | Response to oxygen-containing compound |
| 8.0e-48 | 92 | Response to metal ion |
| 8.2e-47 | 141 | Response to acid chemical |
| 5.1e-46 | 79 | Response to cadmium ion |
| 8.4e-45 | 157 | Cellular response to chemical stimulus |
input_mySoftPower <- 5 #SoftPower to cutoff
input_nGenesNetwork <- 1000 #Number of top genes
input_minModuleSize <- 20 #Module size minimum
wgcna.out = wgcna() # run WGCNA
## Warning: executing %dopar% sequentially: no parallel backend registered
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.7180 3.180 0.738 326.00 332.000 438.0
## 2 2 0.3760 1.020 0.680 152.00 154.000 255.0
## 3 3 0.0671 0.301 0.785 83.90 83.200 167.0
## 4 4 0.0317 -0.181 0.893 51.10 49.500 117.0
## 5 5 0.1980 -0.435 0.973 33.30 31.000 86.0
## 6 6 0.4510 -0.692 0.976 22.90 20.200 66.2
## 7 7 0.5800 -0.977 0.971 16.30 13.700 55.1
## 8 8 0.6780 -1.230 0.962 12.00 9.570 46.9
## 9 9 0.7600 -1.440 0.967 9.10 6.960 40.7
## 10 10 0.8250 -1.560 0.980 7.04 5.220 35.8
## 11 12 0.8470 -1.780 0.963 4.45 3.040 28.6
## 12 14 0.8960 -1.840 0.971 2.98 1.820 23.4
## 13 16 0.9170 -1.800 0.977 2.08 1.130 19.5
## 14 18 0.9260 -1.790 0.966 1.51 0.718 16.5
## 15 20 0.9460 -1.770 0.981 1.13 0.468 14.1
## TOM calculation: adjacency..
## ..will not use multithreading.
## Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
softPower() # soft power curve
modulePlot() # plot modules
listWGCNA.Modules.out = listWGCNA.Modules() #modules
input_selectGO5 <- 'GOBP' #Gene set category
# Read pathway data again
GeneSets.out <-readGeneSets( geneSetFile,
convertedData.out, input_selectGO5,input_selectOrg,
c(input_minSetSize, input_maxSetSize) )
input_selectWGCNA.Module <- 'Entire network' #Select a module
input_topGenesNetwork <- 10 #SoftPower to cutoff
input_edgeThreshold <- 0.4 #Number of top genes
moduleNetwork() # show network of top genes in selected module
## softConnectivity: FYI: connecitivty of genes with less than 19 valid samples will be returned as NA.
## ..calculating connectivities..
input_removeRedudantSets <- TRUE #Remove redundant gene sets
results <- networkModuleGO() #Enrichment analysis of selected module
results$adj.Pval <- format( results$adj.Pval,digits=3 )
kable( results, row.names=FALSE)
| adj.Pval | Genes | Pathways |
|---|---|---|
| 1.2e-115 | 287 | Response to abiotic stimulus |
| 3.2e-79 | 163 | Response to inorganic substance |
| 9.7e-75 | 227 | Response to organic substance |
| 1.7e-64 | 195 | Response to endogenous stimulus |
| 2.4e-64 | 193 | Response to hormone |
| 2.7e-57 | 180 | Response to oxygen-containing compound |
| 8.0e-48 | 92 | Response to metal ion |
| 8.2e-47 | 141 | Response to acid chemical |
| 5.1e-46 | 79 | Response to cadmium ion |
| 8.4e-45 | 157 | Cellular response to chemical stimulus |